TwiSty: A Multilingual Twitter Stylometry Corpus for Gender and Personality Profiling

نویسندگان

  • Ben Verhoeven
  • Walter Daelemans
  • Barbara Plank
چکیده

Personality profiling is the task of detecting personality traits of authors based on writing style. Several personality typologies exist, however, the Myers-Briggs Type Indicator (MBTI) is particularly popular in the non-scientific community, and many people use it to analyse their own personality and talk about the results online. Therefore, large amounts of self-assessed data on MBTI are readily available on social-media platforms such as Twitter. We present a novel corpus of tweets annotated with the MBTI personality type and gender of their author for six Western European languages (Dutch, German, French, Italian, Portuguese and Spanish). We outline the corpus creation and annotation, show statistics of the obtained data distributions and present first baselines on Myers-Briggs personality profiling and gender prediction for all six languages.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Gender Profiling for Slovene Twitter communication: the Influence of Gender Marking, Content and Style

We present results of the first gender classification experiments on Slovene text to our knowledge. Inspired by the TwiSty corpus and experiments (Verhoeven et al., 2016), we employed the Janes corpus (Erjavec et al., 2016) and its gender annotations to perform gender classification experiments on Twitter text comparing a token-based and a lemma-based approach. We find that the token-based appr...

متن کامل

Author Profiling of Twitter Users: Notebook for PAN at CLEF 2015

In this paper, we focused on profiling authors on age, gender, and five personality traits. The corpus consists of anonymized twitter posts categorized into 4 different languages. Our proposed approach was to use a combination of tfidf, function words, stylistic features, and text bigrams, and used an SVM for each task.

متن کامل

Identification of Author Personality Traits using Stylistic Features: Notebook for PAN at CLEF 2015

Author profiling is the task of determining the age, gender or type of the author's personality by studying their sociolect aspect, that is, how the language is shared by people. This paper presents the COMSATS Institute of Information Technology, Lahore entry for the PAN 2015 competition on Author Profiling task. Our proposed system is based on stylometry features. We implemented 29 different ...

متن کامل

Author Profiling using Stylometric and Structural Feature Groupings

In this paper we present an approach for the task of author profiling. We propose a coherent grouping of features combined with appropriate preprocessing steps for each group. The groups we used were stylometric and structural, featuring among others, trigrams and counts of twitter specific characteristics. We address gender and age prediction as a classification task and personality prediction...

متن کامل

A Survey on Authorship Profiling Techniques

Authorship analysis is a text analysis technique that is visualized mainly in three different techniques namely Authorship Profiling, Authorship Identification and Plagiarism Detection. In this paper a brief survey on the recent developments in the area of author profiling approaches were presented. Authorship Profiling is to ascertain various authors characteristics like age, gender, native co...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016